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Abstract 
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we  have  studied  in  depth  a  problem  of  adaptive  control  with  incomplete 
observations,  in  which  the  state  is  a  finite  state  Markov  process.  In 
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I.  SUMMARY  OF  RESEARCH  PROGRESS  AND  RESULTS 


During  the  twenty  months  of  research  supported  by  this  grant, 
significant  progress  has  been  made  in  a  number  of  aspects  of  stochastic 
systems.  In  this  section  this  progress  is  summarized,  and  reference 
is  made  to  the  resulting  publications  listed  in  Section  II. 

A.  Adaptive  Stochastic  Control  with  Complete  Observations 

Our  work  on  the  adaptive  stochastic  control  of  queues  with  complete 
observations  has  continued.  In  [1],  the  priority  assignment  (or  dynamic 
scheduling  problem)  in  a  queueing  system  with  unknown  arrival  and  service 
rate  is  considered.  The  long  term  average  cost  criterion  with  linear 
cost  rates  is  solved,  and  the  optimality  of  our  proposed  adaptive  control 
algorithm  is  shown.  A  distance-measures  approach  to  the  problems  of 
identification  and  approximation  of  queueing  systems  is  presented  in  [2]; 
this  approach  combines  ideas  from  statistical  robustness,  information- 
type  measures,  and  parameter- continuity  of  stochastic  processes. 

In  [3],  we  have  considered  general  discounted-reward  finite  state 
Markov  decision  processes  which  depend  on  unknown  parameters.  An 
adaptive  policy  inspired  ty  the  nonstationary  value  iteration  (NVI) 
scheme  of  Federgruen  and  Schweitzer  is  proposed;  this  is  a  variant  of 
the  usual  method  of  successive  approximations.  It  is  shown  that  this 
adaptive  policy  is  asymptotically  discount  optimal.  This  NVI  policy 
is  compared  with  the  certainty  equivalent  or  naive  feedback  control 
(NFC)  policy.  The  NFC  requires  computation  and  storage  of  the  optimal 
policy  for  ^IJ  values  of  the  parameter  6;  this  represents  considerable 
off-line  computation  and  considerable  storage,  particularly  if  the 
parameter  set  is  not  finite.  On  the  other  hand? 


more  on-line  computation. 


B.  Adaptive  Stochastic  Control  with  Incomplete  Observations 

The  stochastic  adaptive  control  problems  solved  to  date  had  all, 
with  the  exception  of  those  for  linear  systems,  involved  the  assumption 
of  complete  (noiseless)  state  observations.  As  we  proposed,  we  have 
begun  a  major  new  direction  of  research  involving  adaptive  estimation 
and  control  problems  for  stochastic  systems  with  incomplete  (or  noisy) 
observations  of  the  state.  We  have  already  been  successful  in  obtaining 
some  important  new  results;  the  first  of  these  are  reported  in  [4].  In 
[4],  we  consider  discounted  Markov  decision  processes  with  incomplete 
state  information  and  depending  on  unknown  parameters.  The  process  is 
first  transformed  into  a  completely  observed  Markov  decision  process 
and  then  (i)  we  use  conditional  least  squares  estimation  to  obtain  a 
strongly  consistent  parameter  estimation  scheme,  and  this  is  combined 
with  a  nonstationary  value-iteration  procedure  to  obtain  (ii)  approxima¬ 
tions  converging  uniformly  to  the  optimal  reward  function,  and 
(iii)  asymptotically  optimal  adaptive  policies. 

This  paper  provides  important  general  results,  but  the  specific 
structure  of  the  parameter  estimation  scheme  must  be  studied  in  more 
detail.  For  this  reason,  we  have  begun  an  in-depth  investigation  into 
problems  in  which  the  state  is  a  discrete  time  finite  state 
Markov  process  [8].  In  particular,  the  state  is  a  finite  state 
Markov  chain  x^ety^ , . . . ,yn)  with  primitive  transition  matrix  Q.  The 
observation  process  y^elO.l}.  If  Q  is  known,  there  is  a  finite 
dimensional  recursive  filter  for  the  conditional  probability  vector 

pt+l|t  *  ^pt+l  1 1”  ’  ”pt+l  1 1^  ’  where  pt+l|t  =  P^xt+l=Yi ^y0’* ' ' *yt^’ 


In  general,  the  adaptive  estimation  problem  involves  the  computation 
of  estimates  (e.g.,  state  estimates)  in  the  presence  of  unknown  parameters; 
in  addition,  estimates  of  the  parameters  are  often  computed  simultaneously. 

In  the  present  context,  the  adaptive  estimation  problem  is  that  of  computing 
recursive  estimates  of  the  conditional  probability  vector  when  the  transition 
matrix  Q  is  not  completely  known  (i.e.,  it  depends  on  a  vector  of  unknown 
parameters  0.  The  approach  to  this  problem  which  we  investigate  in  [8]  has 
been  widely  used  in  linear  filtering:  we  use  the  previously  derived 
recursive  filter  for  the  conditional  probabilities,  and  we  simultaneously 
recursively  estimate  the  parameters,  plugging  the  parameter  estimates  into 
the  filter  to  obtain  an  approximate  conditional  probability  vector  Pt|t_i* 

The  recursive  parameter  estimator  has  the  form 

et  ■  n-An-i  <>> 

6t  =  +atRtllVt  (2) 

where  y  =  [y^ , . . . ,Yn]T,  (a^}  is  a  sequence  of  positive  scalars,  is  a 
positive  definite  matrix  which  modifies  the  search  direction,  and  is 
an  approximation  of  the  gradient  of  et  with  respect  to  e  (evaluated  at 

The  recursive  stochastic  algorithm  we  have  described  is  of  the 
general  form 

Vi  ’  r'k*akG(nk’5k>-  (3) 

We  follow  the  approach  of  Kushner  to  the  Ordinary  Differential  Equation 

k-1 

(ODE)  Method  of  analyzing  (3).  That  is,  we  define  t.  =  T  (1/i)  and 

-0  1=1 

m(t)  *  max(k:  tk  <t);  thus  m(tk)  Let  n  (•)  denote  the  piecewise 

linear  interpolation  of  the  function  with  value  rv  at  t.  .  Define  the 


shifted  function  n  (•)  by  n  (t)  =  n  (t+t^),  t^O.  Thus  n  ( 0 )  =  nk»  and 
the  idea  is  to  show  weak  convergence  as  k  -*«>  of  the  sequence  {n  (•)}  to 
the  solution  of  an  ODE,  which  can  then  be  used  to  conclude  properties 
(such  as  convergence  as  t -*■<»)  of  the  parameter  estimates  0^.  The 
essential  assumption  is  that  }  depends  on  (r^)  in  such  a  way  that  if 
nk=n»  a  constant,  then  {C^}  has  a  unique  invariant  (or  stationary) 
measure.  In  our  problem,  £k  is  a  Markov  process,  and  most  of  the  paper 
[8]  is  devoted  to  an  analysis  that  establishes  that  it  has  a  unique 
invariant  measure  for  fixed  n  (i.e.,  for  fixed  0).  This  work  represents 
the  first  major  investigation  into  nonlinear  adaptive  stochastic  control 
with  incomplete  observations. 

C.  Other  Research 

We  have  completed  work  in  other  areas  which  were  initially  supported 
under  Grant  AF0SR-79-0025.  In  [7],  asymptotic  approximations  for  some 
nonlinear  filtering  problems  were  derived,  analyzed,  and  compared  with 
other  filters.  Lie  algebraic  and  analytical  methods  were  utilized;  of 
particular  interest  was  the  estimation  problem  for  linear  systems  with 
infrequently  jumping  parameters.  In  [5]  and  [6],  the  structure  of 
nonlinear  control  systems  with  symmetries  was  studied,  and  the  results 
obtained  were  used  to  derive  reduced-order  algorithms  for  certain 
classes  of  optimal  control  problems. 

In  addition,  we  have  initiated  research  on  the  control  of  nonlinear 
discrete-time  systems  with  the  paper  [9],  in  which  we  derive  necessary 
and  sufficient  conditions  for  the  approximate  and  local  linearizability 
of  such  systems. 
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